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Eric C. Kerrigan and Manfred Morari 

Abstract 

Faster, cheaper, and more power efficient optimization solvers than those currently offered by 
general-purpose solutions are required for extending the use of model predictive control (MPC) to 
resource-constrained embedded platforms. We propose several custom computational architectures for 
different first-order optimization methods that can handle linear-quadratic MPC problems with input, 
input-rate, and soft state constraints. We provide analysis ensuring the reliable operation of the resulting 
controller under reduced precision fixed-point arithmetic. Implementation of the proposed architectures 
in FPGAs shows that satisfactory control performance at a sample rate beyond 1 MHz is achievable even 
on low-end devices, opening up new possibilities for the application of MPC on embedded systems. 

I. Introduction 

Model predictive control (MPC) provides a systematic approach for handling physical con- 
straints for automatic control of cyber-physical systems |TJ, [|2j, often leading to improved control 
performance and reduced tuning effort for new applications. However, the intense computational 
demands imposed by MPC precludes its use in applications that could benefit considerably 
from its advantages, especially in those that have fast required response times and in those that 
must run on resource-constrained, embedded computing platforms with low cost or low power 
requirements. 
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For linearly constrained MPC problems of low dimensionality, one can partially avoid this 
computational burden by precomputing the solution map offline using multi-parametric program- 
ming [3|. In this case, the online controller implementation consists only of region search and 
table look-up procedures. Further work integrating the design of the solution map and embedded 
circuits has further increased the efficiency in performing these operations Q. However, for 
larger problems, this approach quickly becomes impractical, mainly due to substantial memory 
requirements, forcing a return to online optimization methods. 

Recently, there has been significant interest in using first-order methods, both in the primal [[5J 
and dual domains Q-Q, for the online solution of linear-quadratic MPC problems. Compared 
to other solution methods for quadratic programs (QPs) (e.g. active-set or interior-point schemes), 
first-order methods do not require the solution of a linear system of equations at every iteration, 
which is often a limiting factor for embedded platforms with modest computational capability. 
This feature, coupled with the observation that medium-accuracy solutions are often sufficient 



for good control performance [10|, make first-order methods promising candidates for efficient, 
low cost MPC. In addition, first-order methods have certain features that make them amenable 
to fixed-point implementation, they can be efficiently parallelized, and their simplicity invites 
analysis that can guide low-level implementation choices for further efficiency gains. 

There have been several recent efforts to translate innovation in optimization algorithms into 



practical solvers customized for MPC problems. In terms of software, [ 1 1 1, [ 12 1 and [ 13 1 describe 
automatic state-of-the-art code generators for interior-point and first-order solvers, respectively, 



whereas [14| describes a widely used active-set based solver. In all cases, embedded applications 
were the primary target, although the solvers are implemented using double precision floating- 
point arithmetic, which is generally not available or is very expensive in embedded computing 



platforms. In terms of hardware, [15|-[17| describe different custom computing architectures 
for both interior-point and active-set methods using reduced floating-point arithmetic in field 
programmable gate arrays (FPGAs), reporting minor speed-ups or use of expensive devices to 
provide significant acceleration. Although there has been some progress in accelerating the core 
component of these algorithms - solvers for linear equations - using fixed-point arithmetic JT8J, 
extending these results to the other aspects of interior-point or active- set algorithms remains 
challenging. 
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Summary of contribution 

In this paper we focus on practical and theoretical issues for efficient implementation of 
optimization-based control systems on low cost embedded platforms. 
1) Architectures: We present a set of parameterized automatic generators of custom computing 
architectures for solving different types of MPC problems. For input-constrained problems, 
we describe architectures for Nesterov's fast gradient method (first described in the pre- 



liminary publication [19|), and for state-constrained problems we consider architectures 
based on the alternating direction method of multipliers (ADMM). Even if these methods 
are conceptually very different, they share the same computational patterns and similar 
computing architectures can be used to implement them efficiently. These architectures are 
extended to support warm starting procedures and the projection operations required in the 
presence of soft constraints. 

2) Analysis: Since for a reliable operation using fixed-point arithmetic it is crucial to prevent 
overflow errors, we derive theoretical results that guarantee the absence of overflow in all 
variables of the fast gradient method. Furthermore, we present an error analysis of both 
the fast gradient method and ADMM under (inexact) fixed-point computations in a unified 
framework. This analysis underpins the numerical stability of the methods for hardware 
implementations and can be used to determine a priori the minimum number of bits required 
to achieve a given solution accuracy specification, resulting in minimal resource usage. 

3) Implementation: We derive a set of design rules for efficient implementation of the proposed 
methods, such as a scaling procedure for accelerating the convergence of ADMM and 
criteria for determining the size of the Lagrange multipliers. The proposed architectures 
are characterized in terms of the achievable performance as a function of the amount of 
resources available. As a proof of concept, generated solver instances are demonstrated 
for several linear-quadratic MPC problems, reporting achievable controller sampling rates 
in excess of 1 MHz, while the controller can be implemented on a low cost embeddable 
device. 



Outline 

The paper is organized as follows: After a brief summary of the general MPC formulation and 
the different first-order methods in Sections |TT] and IIP we focus on the fixed-point analysis in 
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Section IV We follow with the hardware architectures and performance evaluation in Sections |Vj 
andEJ 

II. Soft-Constrained Model Predictive Control Setup 

Throughout, we address control of a discrete-time linear time-invariant (LTI) system in the 
form 

x + = Ax + Bu, (1) 

where x G W 1 * is the system state and u E IR n " is the system input. The overall design goal is 
to construct a time-invariant (possibly nonlinear) static state feedback controller fj, : R. nx — > M. nu 
such that u = fi(x) stabilizes the system ([l} while simultaneously satisfying a collection of state 
and input constraints in the time domain. 

In standard design methods for constructing linear controllers for systems in the form ([T|), 
the bulk of the computational effort is spent offline in identifying a suitable controller, whose 
online implementation has minimal computing requirements. The inclusion of state and input 
constraints renders most such design methods unsuitable. 

A now standard alternative is to use MPC |TJ, [|2|, which moves the bulk of the required 

computationally effort online and which addresses directly the system constraints. At every 

sampling instant, given an estimate or measurement of the current state of the plant x, an MPC 

controller solves a constrained iV-stage optimal control problem in the form 

1 l^" 1 N 

J*(x) = min -x T N Q N x N + xlQx k + u T k Ru k + 2x T k Su k + y j (a x ■ l T 5 k + a 2 ■ \\S k \\ 2 2 ) 

k=0 k=l 



(2) 

subject to xq = x, 

x k+1 = A d x k + B d u k , k = 0,l,...,N - 1, 

u k eU, k = 0,l,...,N- 1, 

(x k ,S k )eX A , k = 1,2,..., N. 

If an optimal input sequence {^(x)}^ 1 an d state trajectory {x*(x)}(L exists for this problem 
given the initial state x, then an MPC controller can be implemented by applying the control 
input u = u* Q {x). 
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We will assume throughout that the system input constraint set U is defined as a set of interval 
constraints U := {u \ u min <u< u max }- We assume that the system states have both free (index 
set J 7 ), hard-constrained (index set £>) and soft-constrained (index set S) components, i.e. the 
set Xa in ([2]) is defined as 



X/ 



5) G R nx x IRf 1 | xjr free, aw < x B < x max , \x i: - x c>i \ < n + 8 h i G <S j , (3) 



with x c ti G K being the center of the interval constraint of radius rj > for a soft-constrained 
state component. The index sets J 7 , B and S are assumed to be pairwise disjoint and to satisfy 
TUBUS = {1,2,..., n*}. 

We assume throughout that the penalty matrices (Q,Qn) G W lxXnx are positive semidefinite, 
R g ]R n " xri « is positive definite, and S G IR n * xn " is chosen such that the objective function in ((2j) 
is jointly convex in the states and inputs. There is by now a considerable body of literature pj, 
p0| describing conditions on the penalty matrices and/or horizon length N sufficient to ensure 
that the resulting MPC controller is stabilizing (even when no terminal state constraints are 
imposed), and we do not address this point further. For stability conditions for soft-constrained 



problems, the reader is referred to [21 1 and |22| and the references therein. 

If the soft-constrained index set S is nonempty, then a linear-quadratic penalty on the auxiliary 
variables 5 fc G IR+', weighted by positive scalars (<7i,<j 2 ), can be added to the objective. In 
practice, soft constraints are a common measure to avoid infeasibility of the MPC problem ([2]) 
in the presence of disturbances. However, there also exist hard state constraints that can always 
be enforced and cannot lead to infeasibility, such as state constraints arising from remodeling 
of input-rate constraints. For the sake of generality we address both types of state constraints in 
this paper. 

If o\ is chosen large enough, then the optimization problem corresponds to an exact penalty 
reformulation of the associated hard-constrained problem (i.e. one in which the optimal solution 
of ([2]) maintains 5k = if it is possible to do so). An exact penalty formulation preserves the 
optimal behavior of the MPC controller when all constraints can be enforced. We first characterize 
conditions under which a soft constraint penalty function for a convex optimization problem is 
exact. 

Theorem 1 (Exact Penalty Function for Convex Programming [23 Prop. 5.4.5]): Consider the 
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convex problem 

/*:=min/(z) (4) 

zeQ 

subject to gj(z) < , j = 1, 2, . . . , r, 

where / : IR n — > K and #j : R n — )■ R, j — 1, . . . , r, are convex, real- valued functions and Q is a 
closed convex subset of M. n . Assume that an optimal solution z* exists with f(z*) = /*, strong 
duality holds and an optimal Lagrange multiplier vector fi* G for the inequality constraints 
exists. 

i) If cti > 1 1 /x* 1 1 oo and a 2 > 0, then 

r 

f* = min f{z) + V (a x ■ 5 j + a 2 ■ %) (5) 

j'=i 

subject to gj(z) < 5j, 5j > 0, j = 1, 2, . . . , r. 



ii) If o"i > ||/i*||oo and (T2 > 0, the set of minimizers of the penalty reformulation in ([5]) 

coincides with the set of minimizers of the original problem in ([4]). 
Remark 1: In the context of the MPC problem ([2]), the penalty reformulation is exact if the 
penalty parameter o\ is chosen to be greater than the largest Lagrange multiplier for any constraint 
\xi — x c> i\ < ri, % G S, over all feasible initial states x. In general, this bound is unknown a priori 
and is treated as a tuning parameter in the control design. The quadratic penalty parameter a 2 
need not be nonzero for such a penalty formulation to be exact, but the inclusion of a nonzero 
quadratic term is necessary for our numerical stability results under fixed-point arithmetic in 
Section |IVl 

For the sake of notational simplicity, the results of this paper are presented with reference 
to the optimal control problem in regulator form in (Q. However, all of our results generalize 
easily to setpoint tracking problems. 

III. First-Order Solution Methods 

We next describe two different first-order optimization methods for solving the optimal control 
problem ([2]) efficiently. In particular, we apply the primal fast gradient method (FGM) in cases 
where only input-constraints are present, and a dual method based on the alternating direction 
method of multipliers (ADMM) for cases in which both state- and input-constraints are present. 
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A. Input-Constrained MPC Using the Fast Gradient Method 

The fast gradient method is an iterative solution method for smooth convex optimization 
problems first published by Nesterov in the early 80s p4| , which requires the objective function 
to be strongly convex |25J §9.1.2]. The method can be applied to the solution of MPC problem ([2]) 
if the future state variables X{ are eliminated by expressing them as a function of the initial state, 
x, and the future input sequence (so-called condensing |TJ), resulting in the problem 

f*(x) = mmf(z;x) := \z T H F z + z T $x (6) 

subject to z G K, 

where z := (u , . . . , Wjv-i) £ n = Nn u , the Hessian H F e M. nxn is positive definite under 
the assumptions in Section |TTJ. and the feasible set is given as K := U x . . . x U. The current 
state only enters the gradient of the linear term of the objective through the matrix $ e R nxri ^. 

We consider the constant step scheme II of the fast gradient method in p6"| §2.2.3]. Its 
algorithmic scheme for the solution of (|6]), optimized for parallel execution on parallel hardware, 
is given in AlgorithmjTJ Note that the state-independent terms (I—^Hp), i$ and {1+0) can all 
be computed offline and that the product j^&x must only be evaluated once. The core operations 
in Algorithm [T] are the evaluation of the gradient (implicit in line |2|) and the projection operator 
of the feasible set, 7Tk, in line [3] Since for our application the set K is the direct product of the N 
n u -dimensional sets U, it suffices to consider iV independent projections that can be performed 
in parallel. For the specific case of a box constraint on the control input, every such projection 
corresponds to n u scalar projections on intervals, each computable analytically. In this case, the 
fast gradient method requires only multiplication and addition, which are considerably faster and 
use significantly less resources than division when implemented using digital circuits. 

It can be inferred from [26 Theorem 2.2.3] that for every state x, Algorithm [T] generates a 
sequence of iterates {zi} 1 ™™ such that the residuals f(zi\x) — f*(x) are bounded by 



mm 




2(f{z ;x)-f{x)), (7) 



for alH = 0, . . . , J max , where k denotes the condition number of /, or an upper bound of it, given 
by K—L/p,, where L and p are a Lipschitz constant for the gradient of / and convexity parameter 
of /, respectively. Note that the convexity parameter / for a strongly convex quadratic objective 
function as in (|6]) corresponds to the minimum eigenvalue of Hp. Based on this convergence 
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Algorithm 1 Fast gradient method for the solution of MPC problem (|6]) at state x (optimized 
for parallel hardware) 

Require: Initial iterate z eK,y = z , upper (lower) bound L (p > 0) on maximum (minimum) 
eigenvalue of Hessian H F , step size /3 = (vZ — y/p) / {yL + yfp) 
i: for i = to J max — 1 do 
2: U := (/ - \H F )yi - \§x 
3: := 7T K (ti) 

4: J/i+i := (1 + @)z i+1 - /ftZj 

5: end for 



result, which states that the bound exhibits the best of a linear and a sublinear rate, one can 
derive a certifiable and practically relevant iteration bound J max such that the final residual is 
guaranteed to be within a specified level of suboptimality for all initial states arising from a 
bounded set (5J. It can further be proved that there is no other variant of a gradient method with 
better theoretical convergence [ |26| , i.e. the fast gradient method is an optimal gradient method, 
in theory. 

The fast gradient method is particularly attractive for application to MPC in embedded control 
system design due both to the relative ease of implementation and to the availability of strong 
performance certification guarantees. However, its use is limited to cases in which the projection 
operation 7Tk is simple, e.g. in the case of box-constrained inputs. Unfortunately, the inclusion of 
state constraints changes the geometry of the feasible set IK such that the projection subproblem 
is as difficult as the original problem, since the constraints are no longer separable in In the 
next section we therefore describe an alternative solution method in the dual domain that avoids 
these complications, though at the expense of some of the strong certification advantages. 

B. Input- and State-Constrained MPC Using ADMM 

In the presence of state constraints, if one imposes (Q, Qn) £ M. nxXnx to be positive definite, 
the fast gradient method can be used again to solve the dual problem via Lagrange relaxation 
of the equality constraints [|6j. However, in this case the dual function is not strongly concave 
and consequently the convergence speed is severely affected. A quadratic regularizing term can 
be added to the Lagrangian to improve convergence, but this prevents the use of distributed 



9 



operations for computing the gradient of the dual function, adding a significant computational 
overhead. We therefore seek an alternative approach in the dual domain. 

For dual problems we do not work in the condensed format ([6]), but rather maintain the state 
variables x k in the vector of decision variables z := (u , . . . , Wjv-ij ^o, 8 , . . . , xjy, 8^) G M. n , 
n = N(n u + n x + \S\) + n x + |<S|, resulting in the problem 

f*(x) =mmf(z;x) := -z T H A z + z T h (8) 

subject to z G K, Fz = b(x). 

The affine constraint Fz = b(x) models the dynamic coupling of the states x k and u k via the 
state update equation ([T]), and is at the root of the difficulty in projecting the variables z onto 
the constraints in the fast gradient method. 

The alternating direction method of multipliers (ADMM) p7| partitions the optimization 
variables into two (or more) groups to maintain the possibility of decoupled projection. In 
applying ADMM to the specific problem (|6]), we maintain an additional copy y of the original 
decision variables z and solve the problem 

f*(x) = mm f(z,y;x) := \y T H A y + y T h + I A (y]x) + I K (z) + %\\y - z\\ 2 (9) 
z,y I I 

subject to z = y, (10) 

where (z,y) G M 2n contain copies of all input, state and slack variables. The functions I a : 
M. n x W nx — > {0, +00} and I K : R n — )■ {0, +00} are indicator functions for the sets described 
by the equality and inequality constraints, respectively, e.g. 

{0 if Fy = b(x) , 
(ID 
00 otherwise , 

where K := U x . . . x U x Xa X . . . x Xa- The current state x enters the optimization problem 



through (11 ). The inclusion of the regularizing term (p/2) \\y — z\\ 2 has no impact on the solution 
to (|9]) (equivalently ([8])) due to the compatibility constraint y — z, but it does allow one to drop 
the smoothness and strong convexity conditions on the objective function, so that one can solve 
control problems with more general cost functions such as those with 1- or oo-norm stage costs. 

Note that there are many possible techniques for copying and partitioning of variables in 
ADMM. In the context of optimal control, the choice given in ([9]) results in attractive compu- 



tational structures [28 1. 
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The dual problem for ([9)) is given by 

maxg(z/) 

V 

ADMM solves this dual problem using an approximate gradient method by repeatedly carrying 



inf L p (z, y, v) := \y T E A y + y T h + I A (y; x) + I K {z) + u T (y - z) + ^\\y - z\\ 2 
z,y Z Z 



out the steps 



y i+ i := argmin L p {zi,y, , 
y 

z i+1 := axgxma L p (z,y i+ i,Ui) , 

z 

Vi+i ■= Vi + p(Ui+i - z i+ i) . 



(12a) 
(12b) 
(12c) 



The gradient of the dual function is approximated by the expression — Zj+i) in (12c), 

which employs a single Gauss-Seidel pass instead of a joint minimization to allow for decoupled 
computations. Choosing the regularity parameter p also as the step-length arises from Lipschitz 
continuity of the (augmented) dual function. There are at present no universally accepted rules 
for selecting the value of the penalty parameter however, and it is typically treated as a tuning 
parameter during implementation. 

Our overall algorithmic scheme for ADMM for the solution of ([9]) based on the sequence of 



operations ( |12a[ )-( fT2c| ), optimized for parallel execution on parallel hardware, is given in Algo- 
rithm |2j The core computational tasks are the equality-constrained optimization problem ( ]12a| ) 



and the inequality-constrained, but separable, optimization problem ( |12b[ ). 

In the case of the equality-constrained minimization step ( 12a[ ), a solution can be computed 
from the KKT conditions by solving the linear system 



H A + pI F T 
F 



Vi+i 



-h — Ui + pzi 
b{x) 



Mn 


Ml 2 




< 2 


M 22 





-1 



Note that only the vector and not the multiplier A i+1 , arising from the solution of this 
linear system is required for our ADMM method. The most efficient method to solve for y i+ i 
is to invert the (fixed) KKT matrix offline, i.e. to compute 

Ha + pI F t 
F 

and then to obtain y i+1 online from y i+ i = M n (—h — u t + pz { ) + M 12 b(x) as in Line [2] of 
Algorithm [2j Observe that the product Mi 2 b(x) needs to be evaluated only once, and that this 
matrix is always invertible when p > since F has full row rank. 
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Algorithm 2 ADMM for the solution of MPC problem ([6]) at state x (optimized for parallel 
hardware) 

Require: Initial iterate z = z*~, u = where z*~ and v*~ are the shifted solutions at the 
previous time instant (see Section [V]), and p is a constant power of 2. 
i: for % = to J max — 1 do 
2: y i+1 := M n (-h + pzi - Vi) + M 12 b(x) 
3: z i+1 := n K (y i+1 + fa) 
4: v i+1 := py i+ i +Vi- pz w 
5-. end for 



The inequality-constrained minimization step ( |12b[ ) results in the projection operation in Line [3] 



of Algorithm [2} In the presence of soft state constraints, this operation requires independent 
projections onto a truncated two-dimensional cone, which can be efficiently parallelized and 
require no divisions. We describe efficient implementations of this projection operation in parallel 
hardware in Section |V] 

This variant of ADMM is known to converge; see p9f §3.4; Prop. 4.2] for general convergence 
results. More recently, a bound on the convergence rate was established in [ |30| , where it was 
shown that the error in ADMM, for a different error function, decreases as where i is 
the number of iterations. This result still compares unfavorably relative to the known 1/i 2 
convergence rate for the fast gradient method in the dual domain. However, the observed 
convergence behavior of ADMM in practice is often significantly faster than for the fast gradient 



method [27 1 



C. ADMM, Lagrange multipliers and soft constraints 

Despite its generally excellent empirical performance, ADMM can be observed to converge 
very slowly in certain cases. In particular, for MPC problems in the form ([6]), convergence may 
be very slow in those cases where there is a large mismatch between the magnitude of the 



optimal Lagrange multipliers v* for the equality constraint ( [10] ) and the magnitude of the primal 
iterates (zi, yi). The reason is evident from the ADMM multiplier update step ( |12c[ ); the existence 
of very large optimal multipliers v* necessitates a large number of ADMM iterations when the 
difference (zi — yi) remains small at each iteration and p w 1. 
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This effect is of particular concern for MPC problem instances with soft constraints. If one 
denotes by zg those components of z associated with the slack variables {5\, . . . ,5n} (with 
similar notation for yg), then the objective function ft)b features a term o\ ■ l T ys, with the exact 



penalty term a\ typically very large. The equality constraints ( |T0| ) include the matching condition 
zg — ys = 0, with associated Lagrange multiplier ug. Recalling the usual sensitivity interpretation 
of the optimal multiplier Ug, one can conclude that z/| « a\ • 1 in the absence of unusual problem 
scaling 



For soft constrained problems, we avoid this difficulty by rescaling those components of 



the matching condition ( flu] ) to the equivalent condition [l/a\){zg — yg) = 0, which results in 
a rescaling of the associated optimal multipliers to Vg « 1. The aforementioned convergence 
difficulties due to excessively large optimal multipliers are then avoided. 

IV. Fixed-Point Aspects of First-Order Solution Methods 

In this section we first motivate the use of fixed-point arithmetic from a hardware efficiency 
perspective and then isolate potential error sources under this arithmetic. We concentrate on two 
types of errors. For overflow errors we provide analysis to guarantee that they cannot occur 
in the fast gradient method, whereas for arithmetic round-off errors we prove that there is a 
converging upper bound on the total incurred error in either of the two methods. The results we 



obtain hold under the assumptions in Section IV-B and guarantee reliable operation of first-order 
methods on fixed-point platforms. 



A. Fixed-Point Arithmetic and Error Sources 

Modern computing platforms must allow for a wide range of applications that operate on 
data with potentially large dynamic range, i.e. the ratio of the smallest to largest number to 
be represented. For general purpose computing, floating-point arithmetic provides the necessary 
flexibility. A floating-point number consists of a sign bit, a mantissa, and an exponent value 
that moves the binary point with respect to the mantissa. The dynamic range grows doubly 
exponentially with the number of exponent bits, making it possible to represent a wide range 

'if one sets the regularization parameter p = in |9j and 01 = 0, then it can be shown that this approximation becomes 
exact. 
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of numbers with a relatively small number of bits. However, because two operands can have 
different exponents, it is necessary to perform denormalization and normalization operations 
before and after every addition or subtraction, leading to increased resource usage and long 
arithmetic delays. 

In contrast, hardware platforms employing fixed-point numbers use a fixed number of bits for 
the integer and fraction fields, i.e. the exponent does not vary and does not have to be stored. 
Fixed-point computations are the same as with integer arithmetic, hence the digital circuitry is 
simple and fast, leading to greater power efficiency and significant potential for acceleration via 
extra parallelization in a custom hardware implementation. For instance, in a typical modern 
FPGA platform pTJ fixed-point addition takes one clock cycle, whereas a single precision 
floating-point adder would require 14 cycles while using one order of magnitude more resources 
for the same number of bits. 

The benefits of fixed-point arithmetic motivate its use in first-order methods to realize fast and 
efficient implementations of Algorithms [T] and [2] on FPGAs or other low cost and low power 
devices with no floating-point support, such as embedded microcontrollers, fixed-point digital 
signal processors (DSPs) or programmable logic controllers (PLCs). However, reduced precision 
representations and fixed-point computations incur several types of errors that must be accounted 
for. These include: 

Quantization Errors: Finite representation errors arise when converting the problem and 
algorithm data from high precision to reduced precision data formats. Potential consequences 
include loss of problem convexity, change of optimal solution and a lack of feasibility with 
respect to the original problem. 

Overflow Errors: Overflow errors occur whenever the number of bits for the integer part in the 
fixed-point representation is too small, and can cause unpredictable behavior of the algorithm. 

Arithmetic Errors: Unlike with floating-point arithmetic, fixed-point addition and subtraction 
operations involve no round-off error provided there is no overflow and the result has the same 
number of fraction bits as the operands p2| . For multiplication, the exact product of two numbers 
with b fraction bits can be represented using 2b fraction bits, hence a 6-bit truncation of a 2's 
complement number incurs a round-off error bounded from below by — 2 6 . Recall that in 2's 
complement arithmetic, truncation incurs a negative error both for positive and negative numbers. 
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B. Notation and Assumptions 

We will use (■) throughout in order to distinguish quantities in a fixed-point representation 
from those in an exact representation and under exact arithmetic. Throughout, we assume for 
simplicity that all variables and problem data are represented using the same number of fraction 
bits b. We further assume that the feasible sets under finite precision satisfy K C K, so that 
solutions in fixed point arithmetic do not produce infeasibility in the original problem due to 
quantization error. 

We conduct separate analyses of both overflow and arithmetic errors for the fast gradient 
method (Algorithm [TJ and ADMM (Algorithm [2]). In both cases, the central requirement is to 
choose the number of fraction bits b large enough to ensure satisfactory numerical behavior. 
We therefore employ two different sets of assumptions depending on the numerical method in 
question: 

Assumption 1 (Fast Gradient Method / Algorithm^: The number of fractions bits b and a 
constant c > 1 are chosen large enough such that 
i) The matrix 

1 



H„ 



c • X ma x.{Hp) 

has a fixed-point representation H n with all of its eigenvalues in the interval (0, 1], where Hp 
is the fixed-point representation of the Hessian Hp, with X max {Hp) its maximum eigenvalue, 
ii) The fixed-point step size $ satisfies 



1 >P > 



K 



(H n ) -l)^K(H n ) +l) *>0 



where n(H n ) is the condition number of H n . 
Assumption 2 (ADMM / Algorithm [2]).- The number of fractions bits b is chosen large enough 
such that 
i) The matrix 



( 


Mil 


M 12 


-i 


Pi 


F T ~ 


) 




ME 


M 22 




F 








is positive semide finite, where p is chosen such that it is exactly representable in b bits, 
ii) The quantization errors in the matrix F are insignificant compared to the model uncertainty. 
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Observe that it is always possible to select b sufficiently large to satisfy all of the preceding 
assumptions, implying that the above conditions represent a lower bound on the number of 
fraction bits required in a fixed-point implementation of our two algorithms to ensure that our 
stability results are valid. Assumptions [T|(|I]) and [2](ji]) ensure that the objective functions (|6]) (for 
the fast gradient method) and (|9]) (for ADMM) remain strongly convex and convex, respectively, 
despite any quantization error. 

In the case of the fast gradient method, Assumption [TJQ guarantees that the true condition 
number of H n is not underestimated, in which case the convergence result of the fast gradient 
method in ([7]) would be invalid. In fact, the assumption ensures that the effective condition 
number for the convergence result is given by 

>«(#„). (13) 
C. Overflow Errors 

In order to avoid overflow errors in a fixed-point implementation, the largest absolute values of 
the iterates' and intermediate variables' components must be known or upper-bounded a priori 
in order to determine the number of bits required for their integer parts. For the static problem 
data (I — H n ), <f> n , 1 + j3, j3, M\\, or Mi 2 , the number of integer bits is easily determined by 
the maximum absolute value in each expression. 

1 ) Overflow Error Bounds in the Fast Gradient Method: 

In the case of the fast gradient method, it is possible to bound analytically the largest absolute 
values of all of the dynamic data, i.e. the variables that change with every iteration. We will 
denote by $„ the fixed-point representation of 

$n = ■ 

c ■ A max (iJp) 

We summarize the upper bounds on variables appearing in the fast gradient method in the 
following proposition. 

Proposition 1: If problem ([6]) is solved by the fast gradient method using the appropriately 
adapted Algorithm [T] then the largest absolute values of the iterates and intermediate variables 
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are given by 



\\ Z i+l 

H n )yi 

I If.; 



< z :-- 



max 



< 



< 



y := z + 



1/inter •" 



I — H„ 



•y, 



(14) 



oo — % • — max 1 1 x 1 1 oo , 



< h :-- 



ll^nlU ■ x, and 



<t'.= Winter + K 



for all i = 0, 1 L 



1. The set X is chosen such that for every state in exact arithmetic 



) -'max 

x G X we have x e X . 

Proof: Follows from interval arithmetic and properties of the vector/matrix 



-norm. 



Note that normalization of the objective as introduced in Section IV-B has no effect on the 



maximum absolute values of the iterates. Furthermore, the bound in ( [14] ) also applies for the 
intermediate elements/cumulative sums in the evaluation of the matrix-vector product. Observe 
that most of the bounds stated in Proposition [T] are tight. 
2) Overflow Error Bounds in ADMM: 

If problem (|9]) is solved using ADMM via Algorithm [2j then we do not know of any general 
method to upper bound the Lagrange multiplier iterates z/j analytically, and consequently are 
unable to establish analytic upper bounds on all expressions involving dynamic data. In this 
case, one must instead estimate the undetermined upper bounds through simulation and add a 
safety factor when allocating the number of integer bits. As a result, with ADMM, we trade 
analytical guarantees on numerical behavior for the capability to solve more general problems. 

D. Arithmetic Round-Off Errors 

We next derive an upper bound on the deviation of an optimal solution z* produced via a 
fixed-point implementation of either Algorithm [T] or [2] from the optimal solutions produced from 
the same algorithms implemented using exact arithmetic. In both cases, we denote by Zi a fixed- 
point iterate. We wish to relate these iterates to the iterates Zi generated under exact arithmetic, 
by establishing a bound in the form 
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with limj^oo Aj finite, where rji := ij — Z{ is the solution error attributable to arithmetic round- 
off error up to the i th iteration. Consequently, we can show that inaccuracy in the computed 
optimal solution induced by arithmetic errors in either algorithm are bounded, which is a crucial 
prerequisite for reliable operation of first-order methods on fixed-point platforms. 

In both cases, we use a control-theoretic approach based on standard Lyapunov methods 
to derive bounds on the solution error arising specifically from fixed-point arithmetic error. For 
simplicity of exposition, we consider only those errors arising from arithmetic errors and neglect 
quantization errors in the analysis. This choice does not alter substantively the results presented 
for either algorithm. Our approach is in contrast to (and more direct than) other approaches to 



error accumulation analysis in the fast gradient method such as [ 33 1, [34], which consider inexact 
gradient computations but do not address arithmetic round-off errors explicitly. In the case of 
ADMM, we are not aware of any existing results relating to error accumulation in fixed-point 
arithmetic. 

1) Stability of Arithmetic Errors in the Fast Gradient Method: 

We consider first the numerical stability of the fast gradient method, by examining in detail 
the arithmetic error introduced at each step of a fixed-point implementation of Algorithm [TJ 
At iteration i, the error in line [2] of Algorithm [T] is given by 

U — U = (I — H n )(ili - Vi) + £t,i 



where e t:i is a vector of errors from the matrix-vector multiplication. Since there are n round-off 
errors in the computation of every component, is componentwise in the interval [— n2~ b ,0]. 

For the projection in line [3} and recalling that KCKisa box, no arithmetic error is introduced. 
Indeed, one can easily verify that the error U — U can only be reduced by multiplication with a 
diagonal matrix diag(e 7r j), with e^ j componentwise in the interval [0, 1]. 

Finally, in line |4j the error induced by fixed-point arithmetic is 

Vi+i - Vi+i = (1 + - PVi + £y,i , 

where two scalar-vector multiplications incur error e y ^ with components in [— 2~ b , 2~ b ] (addition 
and subtraction). Defining the initial error residual terms r\_\ = i] = z — z , and setting 
— zq = yo — yo, one can derive the two-step recurrence 

rji+i = diag(e 7r)i )(/-i7 n ) (?7i+/9(^-^-i)+e 2/ii _i)+ e tji 
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for the accumulated arithmetic error at each iteration. Note that the error 77, at each iteration is 
inherently bounded by the box K. However, in the absence of the projection operation of line [3 
and the associated error truncation, these errors remain bounded. To show this, we can express 
the evolution of the arithmetic error using the two-step recurrence 

(l + P)(l-H n ) -$(I-H n ) 
I 



Vi+i 




Vi 











Vi-i 



=:$« 



6 



+ 



{I -Ha) I 




e y,i-l 







Gt,i 


=:B 


—■Vi 



(15) 



M 



and then show that this linear system is stable. Recalling Assumption [T] which bounds the 
eigenvalues of H n in the interval (0, 1] and j3 in the interval [0, 1), we can use the following 
result: 

Lemma 1: Let C be any symmetric positive definite matrix with maximum eigenvalue less 
than or equal to one. For every constant 7 in the interval [0, 1] the matrix 

(i + 7 )(j-0 -i(i-c) 
1 

is Schur stable, i.e. its spectral radius p(M) is less than one. 

Proof: Assume the eigenvalue decomposition I — C = V T A V, with A diagonal with entries 
Xi G [0, 1). The eigenvalues of M. are unchanged by left- and right-multiplication by [ v v ] and 
its transpose. It is therefore sufficient to examine instead the spectral radius of 

(1 + 7)A -7A 
/ 

Since this matrix has exclusively diagonal blocks, its eigenvalues coincide with those of the 
two-by-two submatrices 

(l+ 7 )Ai - 7 A; 

1 

and it is sufficient to prove that every such submatrix has spectral radius less than one. Note 
that the eigenvalues of M. o,i are the roots of the characteristic equation 

y u 2 -(l+7)A i /i + A i7 = 0. (16) 



M 



D 



M 



DA 



for i 



l,...,n, 
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It is easily verified that a sufficient condition for any quadratic equation in the form 

x 2 + 2bx + c = 

to have roots strictly inside the unit disk is for its coefficients to satisfy i) |6| < 1, ii) c < 1 



and iii) 2\b\ < c + 1. For the eigenvalue solutions to ([16]), this amounts to i) (1 + 7) At/2 < 1, 
ii) Aj7 < 1 and iii) (1 + 7)^ < 'jXi + 1. All three conditions are easily confirmed for the case 

Xi e [0, 1), 7 e [0, l]. ■ 

2) Stability of Arithmetic Errors in ADMM: 

As in the preceding section, for ADMM one can analyze in detail the arithmetic error 
introduced at each step of a fixed-point implementation of Algorithm [2} 

Defining r/i := Zi — Zi, 7, := Vi — Vi, a similar analysis to that of the preceding section produces 
the two-step error recurrence 



Vi+i 




pdiag(e 7rji ) M u 


-diag(e 7Tji ) (M u - 
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Tt+i 




p 2 M n (I -diag(e^)) 


(J-pMn)(J-diag 




= '4i + l 
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Vi 




li 





diag(e 7r , i ) 
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p(I - diag(e 7r> ,) 


) / 







(17) 



— 

=:B 



where: e y> i G [— n2~ b , 0] n is a vector of multiplication errors arising from Algorithm [2J line [2| 
e n t i E [0, 1]™ is a vector of error reduction scalings arising from the projection operation in line[3j 
and e u ,- L G [— 2~ b , 2~ b ] n a vector multiplication errors arising from [4] with _i = 0. Note that 
one can show that even when K is not a box in the presence of soft state constraints, the error 
can only be reduced by the projection operation. The initial iterates of the recurrence relation 
are rj^i = r) , where r] := z - z . 

As in the case of the fast gradient method, these arithmetic errors are inherently bounded by 
the constraint set IK. In the absence of these bounding constraints (so that diag(e 7rj j) = /), one 
can still establish that the arithmetic errors are bounded via examination of the eigenvalues of 
the matrix 

pM n -(Mn-J/) 




N :-. 



(18) 
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Recalling Assumption |2} we have the following result: 



Lemma 2: The matrix N in (18) is Schur stable for any p > 0. 



Proof: The eigenvalues of (18) are either or pAj(Mn), so it is sufficient to show that the 
symmetric matrix Mn satisfies p||Mn|| < 1. Recalling that 



Mn 


M12 




Z 


F T ~ 


ML 


M 22 




F 






where Z := Ha + pi >~ 0, the matrix inversion lemma provides the identity 



Z-3 



/ - Z-iF fr (FZ- 1 F T )- 1 FZ-* 



--: Z-2PZ-2. 



(19) 
li- 1 !!. 



-i 



where P is a projection onto the kernel of FZ 2, hence ||Mn|| < \\Z 2 ||||P||||,Z 2 
It follows that 

p||M u || < P \\(H A + P iy 1 \\ < P - - — -i- — < 1, 

Amin(,-nAj + P 

where A min (i^4) is the smallest eigenvalue of the positive semidefinite matrix Ha- If Ha is 
actually positive definite, then the preceding inequality is strict and the proof is complete. 

Otherwise, to prove that the inequality is strict we must show that 1/ p is not an eigenvalue 
for Mn (which is positive semidefinite by virtue of (fT9]>). Assume the contrary, so that there 
exists some eigenvector v of Mn with eigenvalue 1/p, and some additional (arbitrary) vector q 
that solves the linear system 

r Z F T 
F 

Any solution must then satisfy both Hav E Im(F T ) and v & Ker(F). Consequently v t Hav = 
0, which requires v G Ker(^yi) since Ha is positive semidefinite. Recall that any such v can be 
decomposed into v = (m , . . . , Ujv-i, x , 5q,...,x n , 5 n ). If the quadratic penalty for each 5i is 
positive definite, then v G Ker(if/i) requires each Si = 0. 

Since Fv = 0, the remaining components of v must correspond to a sequence of state and 
inputs compatible with the system dynamics in ([2]), starting from an initial state x$ = 0. Any 
solution v 7^ would then require at least one component Ui 7^ 0. Then v T Hav > ujRui > 
since R is assumed positive definite, a contradiction. 



p ■ v 
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3 ) Arithmetic Errors Bounds for the Fast Gradient Method and ADMM: 

Finally, for both the fast gradient method and ADMM we can use Lemmas [T] and [2] to establish 

an upper bound on the magnitude of error rji for any arithmetic round-off errors that might have 

occurred up to iteration i. 

Proposition 2: Let b be the number of fraction bits and n be the dimension of the decision 



vector. Consider the error dynamics due to arithmetic round-off in ( |15| ) or in ( 17 ), assuming no 
error reduction from projection. The magnitude of any accumulation of round-off errors up to 
iteration i, \\r)i\\ = \\zi — Zi\\, is upper-bounded by 

i-l 

-2-yn{l + n 2 ) J2\\ EAi ^ kB \\ (20) 
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\EA l 



k=0 



for alH = 0, . . . , I n 



1, where matrix E 
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Proof: From the one- step recurrence (15) or (17) we find that 



i-l 



& = A' £ + J2 aMBv ^ i = 0, 1, . . . J max - 1, 



k=0 



such that the result is obtained from applying the properties of the matrix norm. Observe that 
2~ b y/n(l + n 2 ) is the maximum magnitude of Vk for any k = 0, . . . , i — 1. ■ 



Since the matrix A is Schur stable, the bound in ( [20] ) converges. Indeed, the effect of the 
initial error £ decays according to 



| .EM 4 1| oc p{A)\ 



(21) 



whereas the term driven by arithmetic round-off errors in every iteration behaves according to 



i-l 



£ \\EA l - l - k B\ 



p(AY 



k=0 



l-p(A) l-p(A) 



(22) 



This result can be used to choose the number of bits b a priori to meet accuracy specifications 
on the minimizer. 



V. Embedded Hardware Architectures for First-Order Solution Methods 

Amdahl's law p5| states that the potential acceleration of an optimization algorithm through 
parallelization is limited by the fraction of sequential dependencies in the algorithm. First-order 
optimization methods such as the fast gradient method and ADMM have a smaller number of 
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sequential dependencies than interior-point or active-set methods. In fact, a very large fraction of 
the computation involves a single readily parallelizable matrix-vector multiplication, hence the 
expected benefit from parallelization is substantial. Our implementations of both the fast gradient 
method (Algorithm [T]) and ADMM (Algorithm [2]) differ somewhat from more conventional 
implementations of these methods in order to minimize sequential dependencies. Observe that 
in both of our algorithms, the computations of the individual vector components are independent 
and the only communication occurs during matrix-vector multiplication. This allows for efficient 
parallelization given the custom computing and communication architectures discussed next. 
Specifically, we describe a tool that takes as inputs the data type, number of bits, level of 
parallelism and the delays of an adder/subtracter (I a) and multiplier (l M ) and automatically 
generates a digital architecture described in the VHDL hardware description language. 



A. Hardware Architecture for the Fast Gradient Method 

For a fixed-point data type, the parameterized architecture implementing Algorithm [T] for 
problem ([6]) is depicted in Figure [TJ The matrix- vector multiplication is computed in the block 
labeled v T w, which is shown in detail in Figure 2a It consists of an array of Nn u parallel 
multipliers followed by an adder reduction tree of depth [log 2 Nn u ~\ . The architecture for 
performing the projection operation on the set K is shown in Figure 3a It compares the incoming 
value with the upper and lower bounds for that component. Based on the result, the component 
is either saturated or left unchanged. 

The amount of parallelism in the circuit is parameterized by the parameter P. In Figure [Tj 
P=l, meaning that there is parallelism within each dot-product but the that Nn u dot-products 
required for matrix-vector multiplication are computed sequentially. If the level of parallelization 
is increased to P = 2, there will be two copies of the shaded circuit in Figure [T] operating in 
parallel, one computing the odd components of fa and z;, the other computing the even. The 
different blocks communicate through a serial-to-parallel shift register that accepts P serial 
streams and outputs Nn u parallel values for matrix-vector multiplication. These Nn u values are 
the same for all blocks. It takes \^ M -~\ clock cycles to have enough data to start a new iteration, 
hence the number of clock cycles needed to compute one iteration of the fast gradient method 
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Fig. 1: Fast gradient compute architecture. Boxes denote storage elements and dotted lines 
represent Nn u parallel vector links. The dot-product block v T w and the projection block 7T£ 



are depicted in Figures 2a and 3a in detail. FIFO stands for first-in first-out memory and is used 



to hold the values of the current iterate for use in the next iteration. In the initial iteration, the 
multiplexers allow x and $ n through and the result $ n x is stored in memory. In the subsequent 
iterations, the multiplexers allow and / — H n through and <f> n x is read from memory. 




w N 



FIFO 



3> 



(a) Dot-product block with parallel tree archi- (b) Hardware support for warm-starting, which adds one cycle 
tecture. delay. The last entries of the vector are padded with wn, which 

can be constant or depend on previous values. 



Fig. 2: Architectures of dot-product and warm- starting. 



for P e {l,...,Nn u } is 

L F 



Nn,. 



+ Z A flog 2 JVTi tt l + 2l M + 3l A + l 



(23) 



Expression (23) suggests that there will be diminishing returns to parallelization - a conse- 



quence of Amdahl's law. However, (23 ) also suggests that if there are enough resources available, 
the effect of the problem size on increased computational delay is only logarithmic in the worst 
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(a) Box projection block. The total delay (b) Cone projection block. The total delay for each component is 2/^+1. 
from ti to Zi+i is 1^ + 1, x and 8 are assumed to arive and leave in sequence. 



Fig. 3: Projection architectures. A delay of I a cycles is denoted by z 



-Ia 



case. As Moore's law continues to deliver devices with greater transistor densities, the possibility 
of implementing algorithms in a fully parallel fashion for medium size optimization problems 
is becoming a reality. 

B. Hardware Architecture for ADMM 

Algorithm [2] shares the same computational patterns with Algorithm [IJ Matrices M u and M t2 
have the same dense structure as matrices / — H n and $ n , hence the high-level architecture is 
very similar and we do not include it here to avoid replication. The differences lie in the size 
of the matrices, which affect the number of clock cycles to compute one iteration 

L A := j + l A [log 2 (n A )] + l M + 6Z A + 2 , (24) 

where tia '■= N{n u + n x + |«S|) + n x + |<S|, warm- starting support for variables z and v 



(shown in Figure [2b]), and the projection block for supporting soft state constraints described 



in Figure 3b This block performs the projection of the pair (x, 5) onto the set satisfying 
{\x — c\ < r + 5, S > 0} by using an explicit solution map for the projection operation and 
computing the search procedure efficiently. In fact, only I a extra cycles are needed compared to 
the standard hard-constrained projection. The block performs a set of comparisons that are used 
to drive the select signal of a multiplexer. 

Note that since multiplication and division by powers of two requires no resources in hardware 
(just a reinterpretation of an array of signals), if p is restricted to be a power of two, no hardware 
multipliers are required in ADMM outside of the matrix-vector multiplication block. Table [I] 
compares the resources required to implement the two architectures. Again, with ADMM we 
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TABLE I: Resources required for the fast gradient and ADMM computing architectures. 





Fast gradient 


ADMM 


multipliers 


P [Nn u + 2] 


Pn A 


adders/subtracters 


P [Nn u + 3] 


P [n A + 15] 


memory blocks 


P [Nn u + 72^+4] 


P [n A + 8] 


size of memory blocks 




m 



trade higher resource requirements and longer delays for the capability to solve more general 
problems. 

Note that in a custom hardware implementation of either of our two methods, the number 
of execution cycles per iteration is exact. We also employ a fixed number of iterations in our 
implementations of both algorithms, rather than implementing a numerical convergence test, 
since such convergence tests represent a somewhat self-defeating computational bottleneck in a 
hard real-time context. Providing cycle accurate completion guarantees is critical for reliability 
in high-speed real-time applications p6| . 

VI. Numerical Benchmark Study 

We reported an implementation of the fast gradient architecture in the preliminary publica- 
tion [19 1 to implement an input-constrained MPC controller for a real- world, highly dynamic 
positioning system inside an atomic force microscope requiring a sampling rate in excess of 
1MHz. In this paper, for easier comparison with the existing literature, we use a widely studied 



benchmark example consisting of a set of oscillating masses attached to walls [ flOf , p7| , as 
illustrated by Figure |4j The system is sampled every 0.5 seconds assuming a zero-order hold 
and the masses and the spring constants have a value of 1kg and INm -1 , respectivel}^] The 
system has four control inputs and two states for each mass, its position and velocity, for a total 
of eight states. The goal of the controller, with parameters N = 10, Q = I and R = I, is to 
track a reference for the position of each mass while satisfying the system limits. 

We consider first the case where the control inputs are constrained to the interval [—0.5, 0.5] 
and the optimization problem ([6]) with 40 optimization variables is solved via the fast gradient 



2 Note that we choose this sampling time and parameter set for ease of comparison to other published results. Our implemented 
methods require computation times on the order of Ifis, as we report later in this section. 
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|vQsAQvQvf>A| 

Fig. 4: Oscillating masses example. 



method. Secondly, we consider additional hard constraints on the rate of change in the inputs on 
the interval [—0.1,0.1] and soft constraints on the states corresponding to the mass positions on 
the interval [—0.5,0.5]. The remaining states are left unconstrained. The state is augmented to 
enforce input-rate constraints, and the further inclusion of slack variables increases the dimension 
of the state vector to n x = 12. Note that for problems of this size, MPC control designs based 
on parametric programming |3}, Q are generally not tenable, necessitating online optimization 
methods. The resulting problem with 216 optimization variables in the form ([9]) is solved via 
ADMM. The closed-loop trajectories using an MPC controller based on a double precision 
solver running to optimality are shown in Figure [5j where all the constraints become active for a 
significant portion of the simulation. We do not include any disturbance model in our simulation, 
although the presence of an exogenous disturbance signal would not lead to infeasibility since 
the MPC implementation includes only soft-constrained states. Trajectories arising from closed- 
loop simulation using a controller based on our fixed-point methods are indistinguishable from 
those in Figure [5j so are excluded for brevity. 

As a reference for later comparison, an input-constrained problem with two inputs and 10 
states, formulated as an optimization problem of the form ([6]) with 40 variables, was solved 



in [37 1 using the fast gradient method in approximately 50 ^seconds. In terms of state-constrained 



implementations, a problem with three inputs and 12 states, formulated as a sparse quadratic 



program with hard state constraints and 300 variables, was solved in [ 10 1 using an interior-point 
method reporting computing times in the region of 5 milliseconds, while the state constraints 
remained inactive. In both cases, the solvers were implemented in software on high-performance 
desktop machines. 

Our goal is to choose the minimum number of bits and solver iterations such that the 
closed-loop performance is satisfactory while minimizing the amount of resources needed to 
achieve certain sampling frequencies. Figure [6] shows the convergence behavior of the fast 
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(a) Trajectory with 21 samples hitting the input con- (b) Trajectory with 11, 28 and 14 samples hitting 
straints. the input, rate and output constraints, respectively. 

Fig. 5: Closed-loop trajectories showing actuator limits, desirable output limits and a time- varying 
reference. MPC allows for optimal operation on the constraints. 
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Fig. 6: Theoretical error bounds given by (20) and practical convergence behavior of the fast 
gradient method (left) and ADMM (right) under different number representations. 



gradient method and ADMM for two samples in the simulation with an actively constrained 
solution. The theoretical error bounds on the residual round-off error rji, given by ( [20] ), allow 
one to make practical predictions for the actual error for a given number of bits, which, as 
predicted by Lemma [2] and pTj ) and ( |22] ), converges to a finite value. Table [TT] shows the relative 
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TABLE II: Percentage difference in average closed-loop cost with respect to a standard double 
precision implementation. In each table, b is the number of fraction bits employed and J max is 
the (fixed) number of algorithm iterations. In certain cases, the error increases with the number 
of iterations due to increasing accumulation of round-off errors. 





10 


12 


14 


16 


18 


20 


^max \ & 


10 


12 


14 


16 


18 


20 


5 


5.30 


2.76 


2.87 


3.03 


3.05 


3.06 


10 


53.49 


0.18 


1.17 


0.68 


0.57 


0.58 


10 


14.53 


0.14 


0.06 


0.18 


0.20 


0.02 


15 


47.84 


0.46 


1.08 


0.63 


0.51 


0.49 


15 


17.04 


0.35 


0.25 


0.04 


0.00 


0.01 


20 


44.79 


0.76 


0.95 


0.57 


0.45 


0.42 


20 


16.08 


0.15 


0.19 


0.06 


0.01 


0.00 


25 


47.03 


0.98 


0.86 


0.51 


0.39 


0.37 


25 


17.27 


0.15 


0.19 


0.05 


0.01 


0.00 


30 


45.17 


1.02 


0.82 


0.46 


0.35 


0.32 


30 


16.90 


0.31 


0.21 


0.03 


0.02 


0.00 


35 


46.02 


1.07 


0.81 


0.43 


0.31 


0.28 


35 


18.44 


0.19 


0.22 


0.05 


0.01 


0.00 


40 


46.87 


1.29 


0.74 


0.41 


0.28 


0.25 



(a) FGM (b) ADMM 



difference in closed-loop tracking performance for different fixed-point fast gradient and ADMM 
controllers compared to the optimal controller. Assuming that a relative error smaller than 0.05% 
is desirable, using 15 solver iterations and 16 fraction bits would be a suitable choice for the 
fast gradient method. The problem ([9]) solved via ADMM appears more vulnerable to reduced 
precision implementation, although satisfactory control performance can still be achieved using 
a surprisingly small number of bits. In this case, employing more than 18 fraction bits or more 
than 40 ADMM iterations results in insignificant improvements. 

For the implementation of ADMM there are a number of tuning parameters left to the control 
designer. Setting the regularization parameter to p = 2 simplifies the implementation and pro- 
vided good convergence behavior. The maximum observed value for the Lagrange multipliers v 
was 7.8, so the penalty parameter a x was set to o\ = 8 to obtain an exact penalty formulation as 
described by Theorem [JJ In Section III-C it was noted that the convergence of ADMM can be 
very slow when there is large mismatch between the size of the primal and dual variables. This 
problem can be largely avoided by scaling the matching condition ( [TO] ) with a diagonal matrix, 
where the entries associated with the soft-constrained states and the slack variables are assigned 
a and the rest are assigned 1. This scaling procedure correspond to variable transformations 
y = Dy and z = Dz that can be applied offline. 
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In order to evaluate the potential computing performance the architectures described in Sec- 
tion [V] were implemented in FPGAs. For a fixed number of iterations one can calculate the 



execution time of the solver deterministically according to (23) or (24). The field programmable 
gate array (FPGA) designs can be clocked at more than 400 MHz using chips from Xilinx's 
high-performance Virtex 6 family or at more than 230 MHz using devices from the low cost and 



low power Spartan 6 family. Table III shows the achievable sampling times on the two families 
for different levels of parallelization. The resource usage is stated in terms of the number of 
embedded multiplier blocks since this is the limiting resource in these designs. For the input- 
constrained problem solved via the fast gradient method, one can achieve sampling rates beyond 
1 MHz with Virtex 6 devices using a modest amount of parallelization. One can also achieve 
sampling rates in the region of 700 kHz with Spartan 6 devices consuming in the region of 1 W 
of power. For the state-constrained problem solved via ADMM, since the number of variables 
is significantly larger, larger devices are needed and longer computational times have to be 
tolerated. In this case, achievable solution times range from 40kHz to 200kHz for different 
Virtex 6 devices. 

Note that the fastest performance numbers reported in the literature are in the millisecond 
region, several orders of magnitude slower than what is achievable using the techniques presented 
in this paper. 
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